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Abstract. We study a simple solvable model describing the genesis of monomer sequences for hetero-polymers 
(such as proteins), as the result of the equilibration of a slow stochastic genetic selection process which is assumed 
to be driven by the competing demands of functionality and reproducibility of the polymer's folded structure. 
Since reproducibility is defined in terms of properties of the folding process, one is led to the analysis of the 
coupled dynamics of (fast) polymer folding and (slow) genetic sequence selection. For the present mean-field 
model this analysis can be carried out using the finite-dimensional replica method, leading to exact results for 
(first- and second-order) transitions and to rich phase diagrams. 



PACS numbers: 61.41.+e, 75.10.Nr 
1. Introduction 

The functionality of a protein depends largely on the shape of its 3D native state. Determining this shape 
from a protein's sequence of amino-acids is called the 'protein folding problem'. From a medical point of view 
it is desirable to be able to solve the related 'inverse folding problem': given a 3D structure, find a sequence 
of amino-acids which would have this as its native state. Unfortunately X-ray crystallography (the standard 
tool for determining native states) is very time-consuming, and will not deliver the native states of the many 
amino-acid sequences collected through the Human Genome Project for years to come Q ^j. Furthermore, it 
does not reveal the mechanics of the folding process. Molecular dynamics simulations are also extremely slow; 
only the first 10-50 ns of protein folding processes could so far be simulated || [|, [| |(| . In spite of ongoing 
hardware improvements, it will take long before real progress is made. This points to the need for solvable 
mathematical models, designed to capture the essentials of proteins, to shed more light on the nature of the 
folding process and the role of its parameters. 

The main complications in solving protein models are the chain constraint and the presence of disorder, 
embodied in the amino-acid sequences (see e.g. pi W). Natural amino-acid sequences have evolved genetically, 
driven mainly by the two (competing) demands of structure reproducibility and functionality. They are not 
random (random hetero-polymers usually do not fold into unique shapes), whereas most disordered systems 
techniques are based on exploiting self-averaging properties of random disorder. It is not yet understood 
what distinguishes a random amino-acid sequence from a natural one. Our objective is to define and study a 
solvable model describing the interplay between the genetic forces and folding processes, as a first step towards 
understanding the genesis and statistics of natural amino-acid sequences. 

Our model is a simple mean-field hetero-polymer whose microscopic state is described by two degrees 
of freedom per monomer: <f>i € [0, 2tt], giving the orientation of monomer i relative to the backbone, and 
rji € { — 1, 1}, giving its polarity (i.e. hydrophobic vs. hydrophilic)^. The evolution of the orientation variables 
represents the folding; that of the polarity variables (which involves changing monomer species) represents 
genetic evolution. The latter clearly takes place over much larger time-scales than the former, and is the result 
of an interplay between minimizing an energetic cost determined by the average folding quality, constraints 
on the monomer composition, and noise. Models with adiabatically separated time-scales have been studied 
and solved using finite-dimensional replica theories, but so far mainly in the context of neural networks and 

| Thus, in addition to the mean-field nature of the forces and the absence of a chain constraint, our model simplifies biological 
reality further by reducing the orientation degrees of freedom of individual amino-acids to one, and the characterization of the 
physical properties of individual amino-acids to a binary number. 
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spin-glasses ||, [lO| [Tl], [T^, [f3|, |TJ, |l5|, [l6| . We show how these techniques can also be used to study analytically 
the coupled dynamics of (fast) folding and (slow) genetic sequence selection in our present model. 

We first define our model and solve it in the stationary state, describing equilibrium at the largest (genetic) 
time-scale, via the finite-dimensional replica method. Analysis of the resulting macroscopic equations leads 
to general results regarding the ground state and (first- and second-order) phase transitions. We inspect our 
equations more closely for specific (small) values of the number q of allowed local monomer orientations, and 
generate phase diagrams via a combination of analytical and numerical techniques. These are found to be very 
rich, and involve various types of single-state and multiple-state swollen and compact phases. 



2. The model and its solution 



2.1. Model definitions 

We study hetero-polymer models as in |l7| , consisting of N monomers labelled i = 1 ... N. The spatial degrees of 
freedom of the monomers are angles <pi € C C [0, 2tt], describing their orientations around a polymer backbone; 
each monomer i can take only a finite number of positions on a circle. The physical properties of a monomer 
i are defined by its species (i.e. amino-acid type). Here we restrict ourselves by taking into consideration only 
a monomer's polarity r\i = Hi, where rji > for hydrophylic monomers and % < for hydrophobic ones. A 
monomer sequence is thus fully described by the vector 77 = (771, 77/v) € 1R , whereas the spatial configuration 
(or 'conformation') of the system is described by the vector <p = (<fii, (/>n) £ C N . 

On the time-scales of a single evolutionary generation the sequence rj is fixed, and the only allowed process 
is folding, i.e. evolution of the <p. We consider here only the dominant folding force: compactification of 
the polymer via 'shielding' from the solvent of its hydrophobic monomers. A simple well-known type of 
phenomenological Hamiltonian to describe this effect is (see e.g. jl8[ [l9|, EQ|) 

H{4>, 17) = -— ViVj 8[4>i - <£j] (1) 

ij 

with J > 0. The rationale is that efficient shielding of hydrophobic monomers and exposure of hydrophilic ones 
requires separation of the species; ideally with the two polarity types oriented at opposite sides of the chain (here 
a single bend of the chain would ensure shielding) . The Hamiltonian (Q) punishes monomer pairs with opposite 
polarity but identical orientation relative to the backbone (which make polarity-based compactification more 
difficult), and favors pairs with identical polarity and orientation. The mathematical simplifications induced 
by the separable character and the infinite range of the monomer interactions in (jl]) will allow us to solve the 
super-imposed slow genetic process to be introduced below. 

For a fixed realization of the sequence 77, the equilibrium statistics of the orientation variables, at 
temperature T = are characterized by the partition function 

z[r,] = Tr e-m4>.m (2) 

In addition to the folding process we now introduce a stochastic dynamics for the sequences r], which reflects 
the demands of structure reproducibility and structure functionality. The former is measured by the achieved 
degree of species separation, i.e. by (Q). Upon describing the degree of functionality of a sequence rj by a 
potential V(rj), we arrive at the Langevin equation 

± TH = ~{H(<l>,T,) + V(Ti)} + t i (3) 

with zero-average Gaussian random forces obeying (£i(i)£j(t')) = 2T<J« — (which introduces a second 
'temperature' T = ft^ 1 ). Our further discussion will be restricted to the stationary state of the genetic process 
(||). Since the genetic process is adiabatically slow compared to the folding process, we may replace (||) by its 
average over the equilibrium folding statistics p(</>|r/) ~ e -/ 3i? ( < /'. r /) j i.e. 

The equilibrium measure of the genetic process is now also of the Boltzmann form, with a 'genetic' Hamiltonian 
H(rj) which is the sum of the free energy of the folding process, given the sequence ry, and the functionality 
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potential V(r))), and characterized by an associated genetic free energy per monomer /jy (with the 'folding 
partition function' defined in (0)): 



1 



H(r,)=V{v)-^ogZ[r,} 



1 



log Trrj e" 



(4) 



(3 b L,J jW /37V 

We now make a specific choice for the set C of allowed single-monomer angles, viz. C = {(2/c+ l)7r/g, fc = 
0, . . . , g— 1} (q possible orientations per monomer), and we choose the simplest non-trivial type of functionality 
potential V(rj) = 2iLiA t i'7i- I n addition we switch to the Ising version of ([|), i.e. e { — 1,1} for all i (as 
opposed to r\i € M)|. In combination, studying the stationary state at the largest (genetic) time-scale has now 
been reduced to the calculation of the free energy per monomer f^: 



IN 
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(HIT) 



H(<p, V ) 



-E 



(5) 

with r) e { — 1, 1}^ and with \x = (fix, . . . , /xat). 

The energetic forces in our model favor species-based monomer separation (due to the fast process) and 
species selection controlled by the disorder variables {m} (due to the slow process). The ease with which these 
two aims can be achieved simultaneously depends on the statistics of the single-site genetic forces {/ii}; the 
relative importance of the two objectives is controlled by the ratio (3/(3 of associated noise levels. Entropic forces 
act in the opposite direction, favoring randomly distributed monomer orientations and species. The equilibrium 
state of the system must therefore be characterized by disorder-averaged order parameters which measure the 
overall joint distribution of orientation- and species variables along the chain. 



2.2. Solution via the finite-n replica method 

Expression (^), which is of the typical form found for systems with disparate time-scales, can be evaluated 
using the replica method. The temperature ratio n — (3/ (3 is first regarded as integer, which allows us to write 
Z n [rf\ as the partition function of an n-fold replicated system, followed by analytic continuation to non- integer 
n later (if needed). In the special case n — > the polarity numbers {r\i\ reduce to quenched disorder variables, 
and we return to a simplified version of the model in jlTj . For n = 1 one recovers the annealed case (although 
the time-scales remain completely disparate) . The limit n — > oo corresponds to dominant coupling of the two 
processes: here the polarity dynamics is fully deterministic. When referring to temperature in the remainder of 
this paper we will mean T, with T linked to T via T = T/n. 

Insertion into (Jfy of S^a^i = 1 followed by exponent linearization via Gaussian integrals gives 
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We carry out the summations over the polarity variables (first) and over the replicated angles, and take 
the thermodynamic limit. This leads us to the following result for the asymptotic free energy per monomer 
/ = limN^oo /tv, which is evaluated by steepest descent: 



0f = - lim log 

N— >oo iV 



extr 



4(3J 



E(^) 2 " < lo S 
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(6) 



§ Note that this implies that the genetic dynamics (g) will be replaced by either a Glauber-type dynamics, or by a 'soft-spin' 
Langevin equation (involving the familiar parametrized double- well potential for each r)i) from which the binary version of the 
model can be obtained as a limit. 
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in which 4> = (0 l5 . . . , e C" and (gifJ.))^ = Jdfi P{n)g(p), with P(/i) = lim^^ iV _1 £\ % - /a,]. The 
location {Z^} of the extremum in (||) is to be calculated by solving the following non- linear saddle-point 
equations: 



e * 



ZT = 2/3 J ( t^m^t ' xx^ 7 ^ L . (7) 

* \ e-^n Q (E,e^) + e^n Q (E,e-^) 

The physical meaning of the order parameters {Z^} can be determined by adding generating terms to the 
Hamiltonian ([!]), which measure the overall polarity at given positions 0: 

2 N 

H(<f>,T)) v) + 22x$L <f> [<l>, rj) L^ri) = —22^8^ (8) 



Upon working out the general identity lim x ^^o 9f/dx4> = L<f> = (L^(<p, Tj)) , with (f(r),<p)) denoting 
conformational equilibrium averages for fixed {rji} and /(tj) denoting polarity equilibrium averages, it follows 
that 

-Y J Z% = (3JL^ L$ = lim JM^n)) (9) 

a 

Thus is proportional to the disorder-averaged equilibrium expectation value of the average polarity of those 
monomers which are oriented to angle <fi. For future use we will also define the overall average equilibrium 
polarity p: 

i § 

2.3. The replica symmetric solution 

In Appendix A| we show that the replica-symmetric (RS) solution of our saddle-point equations (Q), where 
= {ML,], for all a, is locally stable against replica-symmetry breaking fluctuations. For such RS solutions 
one finds the (||) and (0) reducing to, respectively: 



fgs = min { ^ } \ - l\ ~ ^(log 



4> >t> 



(11) 



Summation over in ( |l2| ) gives an alternative expression for the RS average polarity p ( JlOj ) : 

P \ {Y J(j} e^ JL *--^) n + [Y.^e^-^^Y 1 J 
Equation ([t3]) allows us to write ([ll]) in the form 

This latter expression ( |l4| ) was also found in |17[ ; however, here the average polarity p is an order parameter, 
to be solved simultaneously with the {L^} from ([l3]), whereas in |L7| it was a fixed control parameter. 

Since we have shown replica symmetry to be locally stable, for any choice of model parameters, we will 
henceforth restrict ourselves to RS saddle-points only (there is no evidence for discontinuous RSB transitions). 
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3. General analytical results 

3.1. The high-temperature state 

Let us first identify the high-temperature state. Expansion of the free energy ( pi] ) for fixed n > gives 
fas = -kM» 2 )» -^ogq-^\og2+^J extr {M $[{^}] + 0(/? 2 ), with (using £. 1 = q): 



2j3n 
9 
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It follows that the saddle-point of / is of the form 

2/3n i 

q 

For sufficiently high temperatures the order parameters are thus independent of 4>. This was to be expected 
in view of the invariance of (|l l|) under arbitrary permutations of the available monomer orientations <fi E C. 



-(ri^ + Oip 2 ) (0-0) 



Insertion of the symmetric ansatz = L 
into (|ll],[l2|) gives 

Jq .o 1 



(which is a saddle-point of /^g at any temperature) directly 



/•sym 



— ^ - -^(log 

4 /3n 



,Pn[J£-ii] + e 0n\p-Jt] 



L 



-(tanh[/?n(JX -//)]}, 



(15) 



(16) 



In terms of the average equilibrium polarity (|l0|), which here reduces to p = qL/2, equation ( |l6| ) can be written 
alternatively as p = ( t anh [/3n( 2 Jp/g — A*)])m- The instance wh ere this RS hi gh-te mperature state becomes 
locally unstable can be inferred from the results of Appendix A . In particular ( A.7) gives the condition for a 
zero eigenvalue of the RS Hessian as 



det 



nq(K - L) +q(L- M) + ^ J 



Working out the various matrices for the symmetric state = L, with ([Tq), gives 



= q 2 (tanh 2 [n/3(JL-/Lt)]) ; 



Lrf,^ — q 



= q 1 5, 



<S>-4> 



In the symmetric state the Hessian has two distinct eigenvalues, one relating to changes in the amplitude of 
the symmetric state, and one relating to changes orthogonal to the symmetric state, with associated stability 
conditions: 

locally stable in non— symmetric directions : qT/2J > 1 

locally stable in symmetric direction : qT/2J > n(l — tanh 2 [n/3(JX — 

For n < 1 it immediately follows that, as we lower the temperature, the first stability condition is always 
violated before the second can be. Violation of the second condition implies destabilization in favour of an 
alternative symmetric solutio n ( i.e. the creation of an alternative solution of equation (|l6|)). Thus symmetric 
states = L V0 defined by (|l^) become locally unstable against symmetry-breaking fluctuations at 

T c = 2J/q (17) 

Unless preceded by a first order transition (which will also turn out to be possible in certain parameter regimes), 
this describes a second-order phase transition from a non-separated state into one where the hydrophobic and 
hydrophilic monomers start to prefer distinct orientations. Within the context of our model, the former is to 
be regarded as a swollen state (S) for the polymer, and the second as a compact state (C). Note that expression 
(0), which is identical to that found in fi"7|| , is independent of n. 
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3.2. The replica- symmetric ground state for n > 



We now study the limit (3 — > oo and calculate the ground state of our system, for replica-symmetric solutions 



and fixed n > 0. We define £ + = max^^ 



min,-/, i$ (so 1+ > £_) and the number of locations <j> for 



which £^ = £± by q±, respectively (so q± > 1). The remaining intermediate values for 1$ define the set 
U = {4>\ £-<£(/,< £+}. Similarly we define L± as the values of £± at the relevant saddle-point. The ground 
state is the solution of the following minimization problem: 
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Suppose first that L + = L_, which implies = L = 2p/q for all <j> (i.e. a fully symmetric ground state). Now 
E /J = mm {p} [p 2 /q- ( \2p/q- j\ } fl j 

Next we consider the case L + > L_. Minimization with respect to those £a, for which <fi € U for a given 
realization of {g+, (?_} reveals that £& = for G £/. We can then minimize further with respect to {q±} and 
find g + = q_ = f . Given our previous identification p = ^^^L^, here reducing to p = ^(L + + L_), we write 
£± = p± z, upon which our problem takes the form 

Eq/J = min {P:Z} j ^p 2 + ^z 2 - z - ( \p - fJ,/J\ ) M 

We conclude that the minimum corresponds to x = 1. Both candidate ground state solutions can thus be 
expressed in terms of the monotonically increasing function 

A(z) = min w |^y 2 - (\y - m/^Dm 

as follows 

symmetric: L = L(l, 1, . . ., 1, 1), E Q /J = A(q/2) 

non - symmetric : L = (p-1, 0, . . . , 0,p+l), Eq/J = A(l) - 1/2 
Since g > 2 we may conclude that the ground state is the non-symmetric solution, so 



£o/J = min {j ,}e(p) 



|p-/i/J| 



(18) 



1 2 1 
— P 

r 2 

Note that e(p) is symmetric as soon as P(n) is symmetric. The ground state is one where at most two sites <f>± 
are occupied by monomers, and there are at most two nonzero values L± for the order parameters L<f,. 



3.3. The limits n — > 0, n — > 1 and n — > oo, /or arbitrary T 



If we take n — > in equations ( |ll| . |l2| . |l3| ) (which implies fully random evolution of the polarity variables 1}%) we 
find, as expected, the p = version of the solution for the long-range limit of 0: 



lim 
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The (constant) contribution — log2//3n to the free energy per monomer represents the entropy of the polarity 
variables. Note that this solution is independent of the distribution P(n) of the forces fa, as it should. 

Putting n — ► 1 effectively reduces the polarity variables to annealed ones, in spite of the disparate time- 



scales in the model. Now we find (|TT|,p3 14) taking the form 



lim / RS = min {M I ~ ^(^^cosh^J^-^)]^ \ 



(1+P) 



o/3JL 4 



E e (UL* 



sinh[/3( JL^-n)} ' 
£aCosh[/?(JZ^-/i)] 



Somewhat unexpectedly, these equations are still non-trivial, and retain a rich bifurcation phenomenology as 
we will show later for specific choices of the number q of possible monomer orientations. 

Now we turn to n — > 00, i.e. to fully deterministic genetic dynamics. Here the free energy per monomer 
11) reduces to 



lim f RS = min {M \ - - (max 



ilog(£e^)-^ + ilog(£, 



[lE^-^^(E^)"^io g (E« 



-i3Jt4,\ 



Equation (fL3f) , similarly, becomes 



sgn 



(19) 



(20) 



5.^. Phase diagrams for q = 2 and 5 = 3 

In view of our earlier results regarding the high-temperature states (where L — L(l,l,...,l,l)) and ground 
states (where L = (p — 1, 0, . . . , 0,p + 1)) it is natural to assume that the solution of our model will never 
exhibit more than three distinct values for the order parameters {L^}. Hence we may restrict our analysis to 
q G {2, 3} and expect at most quantitative changes to emerge for q > 3. In the phase diagrams to be given 
below for q G {2,3} we focus on transitions marking bifurcations of local minima of the free energy surface as 
minimized in (|l l|) , rather than on thermodynamic transitions. Since all energetic and entropic barriers in our 
system are extensive, on the time-scales of any experiment (whether real or numerical, and especially in view 
of the extremely slow genetic process) one would never detect thermodynamic transitions. The system will 
in practice be found in the local free energy minimum in whose domain of attraction the initial configuration 
happened to lie. 

For q — 2 we have only two order parameters, which we can write as L± = p ± Z. Insertion into ( pi] ) 
reveals that now the free energy minimization decouples conveniently into 



/ RS = min {z} 



T7? 2 1 

^---log2cosh[/3JZ] 



Hp} 



i<log2cosbM Jp-/i)]> M 

6 P 



(21) 



giving the following independent order parameter equations: 

Z = t&nh[3JZ] p = (tanh[/3( Jp - (22) 

The separation process of hydrophobic from hydrophilic monomers (i.e. the folding), measured by Z, 
disentangles from the species evolution, as measured by p; with the two processes each having their own 
independent transitions. This is a consequence of the fact that for q — 2 the orientation variables (pi effectively 
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become Ising spins, so that upon putting S^ >i( p j — > ^[1 + cr^aj], the 'folding' Hamiltonian in (j5|) reduces to that 
of a Mattis [^lj magnet, from which the variables {7^} can be gauged away. 

The uniform high-temperature state = L \/<j> (where Z = 0) always destabilizes via a second order 
phase transition at the Curie- Weiss temperature T c = J, independent of the distribution The phase 

phenomenology embodied in the equation for p 7 in contrast, is described by an equation for a mean- field 
ferro-magnet at inverse temperature (3 and with random external fields, distributed (apart from a minus sign) 
according to P(fJ-)- It will be therefore be dependent on the choice made for P(^), with invariance of the problem 
under p — > — p for symmetric force distributions P{^)- 

For q = 3 one has <fi E {— |7r, 0, §71"}, and there will be three order parameters {L,p}. One can no longer map 
the model onto a Mattis-like system, and one no longer benefits from the resulting decoupling of orientation 
degrees of freedom from polarity degrees of freedom which occurred for q — 2. According to ([It]), the fully 
symmetric (i.e. swollen) state = L V</> now destabilizes locally at the n-independent temperature T c — 2 J/3. 
In addition we know the ground state of the q — 3 system, for several choices of the force distribution P(/i), 
see e.g. ( pi] , p9| ). In contrast to the previous situation q = 2, however, there appear to be no significantjj| further 
analytical simplifications possible, and to obtain results on compact states and phase transitions in the regime 
of intermediate temperatures we have to resort mainly to a numerical analysis of our fundamental equations 
( pl]JT^jr^ , p^ ) . Bifurcations of new solutions of the saddle-point equations are marked by the smallest eigenvalue 
of the relevant Hessian becoming zero, i.e. 

n{K-L) + L- M x = ~(3Jx (23) 



(see Appendix A ), with the 3x3 matrices as defined in (A. 3, A. 4, A. 5). Extensive numerical analysis of the 
phases in the region T < 2J/3, where the swollen state is locally unstable, reveals a highly non-trivial phase 
phenomenology, with many simultaneously locally stable saddle-points. Giving all details and lines in this 
regime of compact states would be more distracting than informative. In contrast, we will focus mainly on 
transitions in the region T > 2 J/3, where the swollen state is locally stable, but where in spite of this one 
generally finds enhanced first order transitions to compact states whose presence is a direct consequence of our 
coupled dynamics of slow and fast processes^. Note that for q = 3 we give phase diagrams with (3 J along 
the horizontal axis (as opposed to (3 J — n(3J) because, in contrast to q = 2, one here no longer finds that the 
non-trivial transitions can be expressed in terms of just two effective control parameters {(3nJ and f3no~ or (3nJ). 

4. Results for <5-distributed genetic forces 

In the simplest case P(n) — <5[/i — jt\ (i.e. /ii ~ JI for all i) the functionality potential V(rf) in the genetic 
dynamics favors a single polarity type. Since the general dynamical behaviour of our system can be regarded as 
a stochastic version of the combined T = (i.e. deterministic) folding process and the n — > 00 (i.e. deterministic) 
genetic dynamics, we first inspect the limits T — > and n — ► 00. 

4-1. The deterministic limits: T — * and n — * 00 

We first calculate the ground state (i.e. deterministic evolution of configurational angles, T = 0). We find the 
function e(p) in ( |T§| ) reducing to 

.W'i^-i-b-fl 

Upon working out the derivatives of e(p) in the different regimes (characterized by different values of 
sgn[p — Jl/J]) one arrives at the following result: 

E = -J-m, L = 2p(l,0,...,0), {Jjoi V v = = ~x ( 24 ) 

|| It will be clear that our equations can still be simplified partially upon making specific ansatze for the saddle-point as in |l7| , 
such as the uniform (i.e. swollen) state = L V0 (which has already been studied in detail for arbitrary q), or states of the form 
L = (Li, Z/2, 1/2 ) (where our saddle-point and stability problems become 2-dimensional). Note that our system is invariant under 
permutations of the three mo riorn er orientations, so that states like L = (L\, L2, L2) and L = (L2, L2, L\) are equivalent. 
1[ In the long-range limit of 17| one can also find discontinuous bifurcations for very specific values of the control parameters 
(marking the creation a locally stable state with relatively high free, energy, and hence non-thermodynamic) , and very close to the 
general second order transition at T c = 2J/q; this was missed in |l7|. 
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For ~fi > the ground state is one where all monomers have become hydrophobic, and are found at exactly 
the same location relative to the chain. For ~fi < the ground state is one where all monomers have become 
hydrophilic, and are again all oriented at the same location. However, one finds also that for \~p\ < J the 
solutions p = ±1 are both local minima of e{p) (although the state p = sgn[/l] will have an energy higher than 
the ground state p = — sgn[/l] as long as Jl ^ 0). This will cause remanence effects in the low temperature 
dynamics. For Jl = both single-species states p = ±1 give equivalent (local and global) minima. 

Next we turn to n — > oo, i.e. deterministic genetic dynamics. Equation (|l3|), together with the order- 
parameter equation (|l4|), give us three candidate solution classes, p 6 { — 1, 0, 1} for n — > oo. Only p — ±1 will 
be potentially stable, separated by the unstable fixed-point p = 0: 
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For the swollen state — 2p/q V</> this implies the following: for \~p\ > 2J/q there is only the solution 
p = — sgn[/l], for < 2J/g both solutions p = ±1 are locally stable (but with the lowest free energy obtained 
for p = — sgn[/l]). 

In both cases the system evolves towards a single-species state, with either all hydrophilic monomers (the 
preferred option when Jl < 0) or all hydrophobic monomers (the preferred option when Jl > 0). For small \ JJ\ only 
the preferred states are locally stable, but for sufficiently large \JX\ both single-species states are. This behaviour, 
which is not desirable from a biological point of view, results from the simple fact that for P(/i) = d[fi — Jl] the 
single-species state with appropriate polarity sign is energetically favorable to both the folding process and the 
genetic selection process. 



4-2. Phase diagrams for q = 2 

According to (0), for P(/i) = 5\p — Jl] we find the simple Curie- Weiss equation p = tanh[,S(Jp — Jl)]. Upon 
inverting this to /3JI = f3Jp — | log[(l+p)/(l— p)] and calculating the derivative with respect to p, one can work 
out the bifurcation properties of the solution: 

(3 J > 1 and |7Z| < \i c '■ two local minima p in (pi 
elsewhere : one local minimum 

with 



'pJ^pJ-l--log 



(25) 



The transitions at \l = ±fj, c are first-order, except for the common point (/3J,(3Jl) — (1,0) where they become 
second-order. In the region of multiple local minima, for Jl ^ the global minimum has sgn[p] — — sgn[/l], 
whereas for Jl = the local minima for f3J > 1 are equivalent. We note 

1 ,x 



f3^ c = f3J + 0(log(f3J)) (/3J^oo) 

This agrees with our results regarding the ground state. We have now determined all phases and transitions: 
there is one second order transition at (3J = 1 from a swollen state with uniformly distributed monomer 
orientations to a compact state with separation of polarity types, and two first-order transitions at Jl = 
±/i c (/3, n, J) (in the region (3nJ > 1) marking the creation of multiple locally stable values for the average 
polarity. These lines are shown in the (/3 J, f3jl) phase diagram, in figure ^ We denote swollen phases with I 
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Figure 1. Phase diagrams for q = 2 and P(/i) = — JZ]. The four possible phases are {51, 52} (swollen phases 
with 1 or 2 possible locally stable values of p) and {CI, £72} (compact phases with 1 or 2 possible locally stable 
values of p). The 51 — > CI and 52 —* C2 transitions (at f3J = n) are second order (they mark the creation of 
Z ^ solutions of (p^)). The 51 — > 52 and CI — » C2 transitions are first order (except at ~p = 0, where the 
transition is second order). Left diagram: n = 5. In this and subsequent phase diagrams all first and second 



order transitions are drawn as dashed and solid curves, respectively. Right diagram: n ■■ 
and that phase 52 exists only for n > 1. 



, . Note that f3 = fin 



possible locally stable values of p as Si, and compact phases with t possible locally stable values of p as Ci 
The number of possible phases depends explicitly on the value of n, since phase 52 exists only for n > 1. 



4-3. Phase diagrams for q = 3 

Since for both T — » and T — > oo there are only two non- identical values for the order parameters L^, 
we expect the solution to be of the form L — (Li, L2, L2) at all temperatures. We thus insert the ansatz 
{Lj- 2 Tr/3 — j(2p + Z), Lq — \{^p — 2Z)} into our saddle-point equations and find 



P 



Z 



e -0n[7I-Zjp](2 e i0JZ + e -i0JZ\n _ e /3n[p-%Jp](2 e -il3JZ + e %f3JZ\n 



e -Pn[-p-lJp](2 e iP JZ + 
-F[Z-p] F[Z;p] 



)" + 



f}n[n- 



? J f](2e 



e 3 



pjz 



)" 



2p[l - cosh(/3JZ)] + 6sinh(/3JZ) 
5 + 4 cosh(/3 JZ) 



(26) 



(27) 



with a corresponding simplification of the bifurcation condition (|23|). For Z = these equations bring us back 
to the swollen state = |p, with p = tanh[/3n(| Jp — /!)]■ Comparison with the corresponding equation for 
q = 2 shows that the properties of the swollen state with q — 3 can be obtained from those derived for q = 2 
via the substitution J — > | J. This gives 



swollen state 



/3J > — and |/z| < /i c 



elsewhere 



two local minima p 
one local minimum 



with 



1 



|/3J-1 



(28) 



The transitions at /i = ±/i c are first-order, except for the point (f3J,0p) = (1,0) where they are second-order. 
Since the swollen state is locally unstable (against collapsed solutions) for f3J > I, the transitions at ~p = ±/i c 
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Figure 2. Phase diagrams for q = 3 and P(fi) = S[fi — /I]. The possible phases are {S£} (swollen phases with i 
possible locally stable values of p), {CI} (compact phases with I possible locally stable values of p), {Sl+Ci} 
(phases with 21 possible locally stable states of p, half of which are compact), and CO (compact states only). 
All transitions are first order, except for the one at 0J = | (which is second order). Upper left diagram: n = i. 



Upper right diagram: n = 1. Lower left diagram: n ; 



Lower right diagram: n - 



can be observed only for n > 1. It follows from F[Z;p] = |/3JZ + 0(Z 2 ) that the swollen state, Z = 0, indeed 
destabilizes at (3 J = | in favor of a collapsed state of the type above, with Z ^ 0. 

The main qualitative change observed for q = 3 is the emergence of prominent transitions to compact states 
before the temperature where the swollen state becomes locally unstable. These are in fact also found to be 
present in the long range limit of jl?]] , where there is no genetic dynamics and where p is a control parameter, 
but far less prominently and with very small attraction domains of the associated newly created free energy 
minima. Note that for n — > our present model will not exhibit these first order transitions, since this would 
reduce our system to the p = case of ]17| (the first order transitions are found only for 0). 



5. Results for zero-average binary genetic forces 

Our second example is the symmetric binary distribution = ^[/i + er] + i(5[/i — a], with a > 0. Here 

the functionality potential V(rj) in the genetic dynamics discourages the biologically undesirable single-species 
states with p = ±1. Again we first inspect the limits T — > and n — > oo. 
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5.1. The deterministic limits: T — > and n — > oo 



We first calculate the ground state (deterministic evolution of configurational angles: T = 0). For the present 
choice of P(p) one finds for the function e(p) in 



(29) 



1, 1 1 <J 1 £T 

e ^ = 2 P ~ 2 ~ 2 J _ 2 J 
Upon working out the details in the three regimes {p < —cr/J, \p\ <<j/J, p > cr/J} one finds that for a > 
there is always a local minimum of e(p) at p = 0, with _E = — s J — cr, and that for a < J there are two additional 
local minima of e(p) at p = ±1 with E = — J. The latter will replace p = as the global minimum for a < ^ J: 

<t>J/2: £? = -§J-er, p = 0, L= (-1,0,..., 0,1) 

a<J/2: E = -J, p = ±1, L = 2p(l, 0, . . . , 0) 

For a > i J the ground state ( p9[) is one where half of the monomers are hydrophilic and half are hydrophobic 
(in line with the requirements of the potential V(r/)), and where the two species are perfectly separated with all 
hydrophobic monomers clustered at one site, and all hydrophilic ones clustered at another. The origin of the 
(undesirable) single-species minima at p = ±1, for a < J/2, can be understood upon realizing that for a — > 
one returns to the ^-distribution studied earlier, with ~p = 0. For a = J/2 both types of states (multiple-species 
and single-species) give energetically equivalent minima. 
For n — > oo (deterministic genetic dynamics) we get 



lim /] 



( 4> <t> 



a log 

2/3 & 



E, 



E 9 



J3Jt 4 
1 

2 



log 

2/3 6 



E 



-&JI4, 



E< 



(30) 



P= - 2 sgn 



log 

2/3 8 



E, 



E, 



-/3JX<< 



2 Sgn 



log 

2/3 6 



E c 



E< 



(31) 

{-1^,-p} of 



We now have five candidate solutions p€ {- 1, — |, 0, |, 1} (due to the symmetry under {L^,p} 
the minimization problem, the p — ±1 and p = ±g solutions are mutually equivalent). The potentially stable 
ones are p € {—1, 0, 1}, separated by the unstable fixed-points p — ±|: 



p=l 



2e (iJL * 
J 



lim / RS = j ^ L| 



1 

2fi 



log 



E 



E fl 



> (7 



p = 



„PJLi 



-pJL^ 



-/3JL,/, ' 



1 

2/3 



log 



E< 



E, 



< fT 



lim /] 



.us 



^log(Ee^)-^log(E^ 



-1 : 



-2e~ 0JL * 



1 



E, 



2/9 

~PJL 4 \ 



log 



E 5 



E„ 



-PJL$ 



< 



For the swollen state = 2p/q this implies: for a > 2J/q there is only the solution p = 0, for a < 2J/q also 
the solutions p = ±1 are locally stable (but with a lower free energy than the p = state only when a < J/q). 

The general picture emerging from the above two limit cases is, as expected, that for sufficiently large a 
the single-species states p = ±1 (which continue to be energetically favorable for the fast process) will indeed be 
replaced by a biologically more interesting state with equal fractions of hydrophilic and hydrophobic monomers. 
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5.2. Phase diagrams for q = 2 

Now the order parameter equation becomes 

p = G(p), G( P ) = i toah\p(Jp-a)] + \ tanh$(Jp+a)] (32) 

Due to the symmetry P(fi) — P(—fJt) we always find the state p = (equal fractions of hydrophobic and 
hydrophilic monomers). We expand the function G(p) in powers of p, 



G{p) = 0J[1 - tanh 2 (/3cr)].p - \{(3J) 3 1 - 4 tanh" {(3a) + 3 tanh 4 (/3cr) .p 4 + 0{p b ) 
and conclude that the state p = de-stabilizes when tanh 2 {(3a) = 1 — 1/(3 J, or, equivalently, at 



foe = - log 



IJj+ \jjj 



-1 



<PJ- \ (3J-1 



(33) 



and that this corresponds to a true second-order transition as long as (33 < 3/2 (so that G"'(0) is negative). 
We note that 

0a e = 0J-1) 1/2 +O0J-1) (pJ->l) 
0a c = ^\og0J) + O(l) (/3J^co) 

For (3 J > 3/2, however, there are first-order transitions away from the state p = at temperatures higher 
than the one defined by (^). The locations in the phase diagram of the latter, which start at the triple 
point ((3 J, (3a) = (|, arctanh[l/v / 3]), are to be solved (numerically) from the coupled transcendental equations 
p = G{p) and 1 = G'(p), i.e. 

p = ^ t&nh[f3(Jp-a)] + - tanh[/3( Jp+cr)} (34) 
1 = /3J jl - itanh 2 [/3(Jp-cr)] - i tanh 2 [/3(Jp+cr)]| (35) 

It is now (in contrast to the case of (5-distributed genetic forces) possible to have three locally stable values for 
the average polarity p (one of which is zero, the other two are different in sign only). This exhausts the phases 
and transitions: there is one second order transition at (3 J = 1 from a swollen state with uniformly distributed 
monomer orientations to a compact state with separation of polarity types, two first-order transition lines (in 
the region (3nJ < |) marking the creation of multiple locally stable values for the average polarity together with 
de-stabilization of the state p = 0, and two first order transition lines where two nonzero values of p bifurcate 
discontinuously (without affecting the stability of p — 0). These lines are shown in the ((3J,(3o~) phase diagram, 
in figure |[ We denote swollen phases with £ locally stable values of p as S£, and compact phases with £ locally 
stable values of p as CI. The number of possible phases is again found to depend explicitly on n: phase S2 
exists only for n > 1, and 53 exists only for n > |. 

5.3. Phase diagrams for q = 3 

For the force distribution P{fj) = a] + ^d[p,+cr] we know the ground state to be L = (—1, 0, 1) for a > J/2 

(with p = 0) and L — 2p(l, 0, 0) for a < J/2 (with p = ±1). We therefore expect the solution to be of the form 
L = (Li, L12, L2) at all temperatures at most when a < J/2. Inserting {L± 27r /3 = \(2p + Z),L = \{2p — 2Z)} 
into our saddle-point equations now leads to 

P _ 2 e -l3n[a-^Jp](2 e ^l3JZ + e -%pjZj n + e f}n[<r-%J P ) (2 e -?PJ 'Z _|_ ^JZy 

I e /3n[a+| Jp] (2 e hPJ z + e ~%P JZ ) n - e -P n iv+%Jp] (2e~3 /3l/z + e §' 9JZ )™ 

+ 2 e /3«k+§ Jp\(2 e \PJZ _|_ e -§/3JZ)n _|_ e -Pn\o+\jp\(2 e -\{iJZ + g§/3JZ-)rt 

7 m7 ! p[7 , 2p[l-cosh(/3JZ)]+6sinh(/3JZ) 

Z = F[Z,p] F[Z,p} = 5 + 4 C0S h(/3JZ) (37) 



(36) 
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Figure 3. Phase diagrams for q = 2 and P(n) = ~<5[/i— a] + ^5[/i+cr]. The six possible phases are {SI, S2, 53} 
(swollen phases with 1, 2 or 3 possible locally stable values of p) and {CI, C2, C3} (compact phases with 1, 2 
or 3 possible locally stable values of p). The Si —> CI transitions (at f3J = n) as well as the 51 — ► 52 and 
CI — > C2 transitions are all second order. The 51 — > 53 and CI — > C3 transitions are first order. Left diagram: 
n = 5. Middle diagram: n = |. Right diagram: n = i. Note that /3 = f3n, that phase 52 exists only for n > 1, 
and that phase 53 exists only for n > |. 
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Figure 4. Phase diagrams for q = 3 and P(fi) = hS\fi— a] + |(5[/i+<r], with n = 2. The possible phases are 
{S£} (swollen phases with £ possible locally stable values of p), {CI} (compact phases with I possible locally 
stable values of p), {Sl+Cl} (phases with 21 possible locally stable states of p, half of which are compact), and 
CO (compact states only). First order transitions are indicated by dashed curves, second order ones by solid 
curves. The right diagram is an enlargement of one of the two areas in the left diagram where all lines nearly 
meet. 



(with a corresponding simplification of the bifurcation condition (|2^)). For Z = these equations bring us back 
to the swollen state L^, — |p, but now with 

12 12 

p = - tanh[/3n(- Jp - er)] + - tanh[/3n(- Jp + cr)} 

Comparison with the corresponding equation for q = 2 shows that, for a < h J, the properties of the swollen 
state with q = 3 can again be obtained from those derived for q — 2 via the substitution J — ► | J. This tells us 
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Figure 5. Onset of multiplicity of locally stable states at T = 0, for Gaussian-distributed forces \n (with average 
Ji and variance a 2 ). Solid curves: critical lines iz~fi c /J in parameter space as defined in fcq), terminating at 
a/J = -JTpK. Region I: a single local and global minimum (the ground state), with sgn[p] = — sgn[/^]. Region 
II: an additional local minimum (with energy higher than the ground state), with sgn[p] = sgn[/l]. 



that the state p = (which always solves the saddle-point equation) de-stabilizes at 

(38) 



fac = - log 



§/3J+ J\f3J-l 



pJ-JpJ-1 



and that this corresponds to a true second-order transition as long as (3J < 9/4. For (3J > 9/4, however, there 
are first-order transitions away from the state p = at temperatures higher than the one defined by (^8|) , which 
start at the triple point (/3J,/3<j) = (|, axctanb.[l/v3]), to be solved (numerically) from the coupled equations 

p= X - tanh^ Jp-a)] + \ tanh[0(| Jp+a)] (39) 
1 - f^{l- ^tanh 2 [/3(^J P -cr)] - itaah 2 ^(|jp+(7)]| (40) 

Thus we can have, even for the swollen state, three locally stable values for the average polarity p (one of which 
is zero, the other two are different in sign only). Since the swollen state is locally unstable (against collapsed 
solutions) for f3J > |, however, the above transitions can be seen only for n > 1. Since F[Z;p] = ^(3JZ+0(Z 2 ), 
the swollen state, Z = destabilizes at (3J = | in favor of a collapsed state of the type L = (Li, L 2 , L 2 ). 

We show the resulting phase diagram (complemented by numerical analysis of the transitions) in figure ^, 
for n = 2. As a result of the symmetry P(fi) = P(— ; pi), here the 51 phase always has p — (in contrast to e.g. 
the case of ^-distributed forces), whereas CI has p ^ 0. The 52 and C2 phases also involve p / (with the two 
possible values different in sign only). The only phase that exists for large values of a is 51. One notes again 
the crucial dependence on n of the existence of some of the phases (clearly evident in the equations above). It 
is now clear that upon choosing a sufficiently large value for a one can eliminate the biologically undesirable 
single species states at any temperature T. 

6. Results for Gaussian-distributed genetic forces 

Our third example is the Gaussian distribution P(/j) = [2ira 2 ]~ i e~ ^ l a . Again, provided a is sufficiently 
large, the genetic functionality potential discourages the biologically undesirable single-species states. For 
a — > oo it follows immediately from (|l^) that p = 0. 
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6.1. The deterministic limits: T — > and n —* oo 



Working out the relevant Gaussian integral in (O) now gives us 



pJ- P 



Fix] 



x Erf [4=] 



7T 



(41) 



It follows that de(p)/dp = p — Erf [(p J — Jt)/<7 v2] ■ Graphical inspection of the functions e(p) and de(p)/dp 
reveals that for /I 7^ the global minimum of e(p) (i.e. the true ground state) is always at a value of p with 
sgn[p] = — sgnfp], but that for sufficiently small a there will be an additional local minimum (bifurcating 
discontinuously, together with a corresponding local maximum) with sgn[p] = sgn|jt], provided \Jl\ < J. The 
bifurcation point is calculated by solving simultaneously the equations de(p)/dp — d 2 e(p)/dp 2 — 0. Working out 
these equations and eliminating p leads us to the two critical lines JI = ±Jt c (o~), which separate a region with a 
single local minimum from a region with two local minima: 



J 



= Erf 



1 



V2 ] 



'log 



2 J 2 




2J 2 



(42) 



The result is shown in figure ^. For a — ► one finds ~p c / J — ► 1, and for cr/ J — > \/2/ir one finds 7J C — 0. Thus for 
(j — ► we recover the ground state of the example P(fi) — S[/j, — ~p], as it should. For JI = 0, on the other hand, 
we have de(p)/dp = p — Erf [p . Now one has only the solution p — (equal numbers of hydrophobic and 
hydrophilic monomers) for a j J > \J2~Jtt, and two equivalent stable solutions ±p with sgn[p] = — sgn[jt] for 
a I J < y/2/ir, separated by a second-order transition at a j J = 

For n — > 00 (deterministic genetic dynamics one finds, upon doing the required integrals: 



lim /RS = min { ^ } U^-^ log(£ e^) - ± log ( E , 



-F 




p= - Erf 



ctV2 



— log 

2/3 B 



E« 



(43) 



(44) 



with the function F[x] defined in (O), and together with our order-parameter equation ( O ) for the {£0}. 
Here one will generally find a continuous dependence of the average polarity p on the system's control 
parameters. Inspection of the swollen state = 2p/q V</> here implies studying the saddle-point equation 
p = Erf[(2pJ/<7 — Jl)/a^/2]. For JI = one finds only the p — state for a/J> 2^/2/q^/n, which destabilizes 
at a/ J — 2v / 2/<Zv / 7r in favour of two energetically equivalent p ^ solutions (with opposite signs). For JI 
there is always a solution with sgn[p] = — sgn[/Z], which always carries the lowest free energy. A bifurcation 
analysis of our saddle-point equation for p reveals further that an alternative solution with sgn[p] = sgnjjt] is 
created discontinuously at JI = ±/I r . where 



^ = - Erf 
J Q 



\ 



lo [ 2 ^ j N 
\qo-ypK 



aV2 



A 



(provided \JJ\/J < (2/g)Erf [V^/tt])- 

As with binary distributed /i, again we find that for sufficiently large a the single-species states p = ±1 are 
replaced by biologically more interesting states with equal fractions of hydrophilic and hydrophobic monomers. 



6.2. Phase diagrams for q = 2 

Working out the q — 2 equations (^) for zero-average Gaussian genetic forces (since choosing Jx ^ always has 
a deteriorating effect) leads us to the following saddle-point equation for the polarity order parameter p (with 
the short-hand Dz = (2n)~? e - ^ 2 ): 

p = G{p), G{p) = j Dz tanh[/3(Jp-crz)] (45) 
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Figure 6. Phase diagrams for q = 2 and P(a) = [2iTcr 2 ]- 1 / 2 e~? fi /<T . The four possible phases are {51,52} 
(swollen phases with 1 or 2 possible locally stable values of p) and {CI, C2} (compact phases with 1 or 2 possible 
locally stable values of p). Here all transitions are second order. Left diagram: n = 6. Right diagram: n = |. 
Note that = fin, and that phase 52 exists only for n > 1. 



Again p = is always a solution. We expand G(p) for small p and find 
G(p)=j3Jp Jdz l-tanh 2 (/3erz) 

-\{PJfp 3 Jdz 1-4 tanh 2 0a z) + 3 tanh 4 0a z) 
and conclude that the state p — de-stabilizes at 
1 = /3J J Dz 1 - tanh 2 (/3crz) 

We note that 



o( P 5 



(46) 



(3a c = 0J-1) 1 ' 2 + O0J-1) 0J 
Pa e = J-pJ+0(l) 0J 



1) 



(in line with our earlier results on the ground state) . The transition (|46|) would be preceded by a first order one 
from the point onwards where G"'(0) > 0. Along the line ( |46| ) this latter condition translates into 



1<\PJ 



Dz 



1-tanh 4 (/3crz) 



Numerical analysis reveals that this inequality is never satisfied, and there is no evidence for first order 
transitions. The final picture for q = 2 is thus as follows. There are only second order transitions: one 
occurs at /3 J — 1 from a state with uniformly distributed monomer orientations (representing a swollen state) 
to a state with separation of polarity types (representing a compact state), the remaining two occur for a = ±tr c , 
where a c denotes the non- negative solution of (^), and mark the creation of two locally stable non-zero values 
for the average polarity (which differ in sign only) together with de-stabilization of the state p = 0. These lines 
are shown in the 0J,(3a) phase diagram, in figure ||. We again denote swollen phases with t locally stable 
values of p as SI, and compact phases with I locally stable values of p as CI. As in the previous examples, 
phase 5*2 exists only for n > 1. 
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(3J 

Figure 7. Phase diagram for q = 3 and P(^t) = [2-Ka 2 ]~ 5 e~ 3^ / CT l with n = 2. The possible phases are 
{S£} (swollen phases with £ possible locally stable values of p), {C£} (compact phases with £ possible locally 
stable values of p), {S£+C£} (phases with 21 possible locally stable states of p, half of which are compact), and 
CO (compact states only). First order transitions are indicated by dashed curves, second order ones by solid 
curves. 



6.3. Phase diagrams for q = 3 

For the force distribution P(^) = (2na 2 )~^e~^ 2 / cr ~ we know the ground state to be L = (—1,0,1) for 
a > Jy/2/n (with p — 0) and L = 2p(l,0,0) for a < Jy/2/n (with p = ±1). We therefore expect the 
solution to be of the relatively form L = (Li, L2, L%) at all temperatures at most when a < J/2. Inserting 
{L±2tt/3 — |(2p + Z),Lq = \{2p — 2Z)} into our saddle-point equations now leads to 



Du 



„()n[ou+ljp}(2 e \PJZ _|_ e -%pjZ\n _|_ e -0n[(TU+% Jp] (2e~l l3JZ + ef/ 3 ' 72 )" 



Z = F[Z;p] 



F[Z;p] = 



2p[l - cosh(pjZ)} + 6sinh(/3JZ) 
5 + 4 cosh(/3 JZ) 



(47) 



(48) 



For Z = these equations bring us back to the swollen state L$ = |p, but now with 



f 2 

P = Dz tanh[/3n( — Jp + o~z) 



Comparison with the corresponding equation for q = 2 shows that, for a < J^2/tt, the properties of the swollen 
state with q = 3 can again be obtained from those derived for q — 2 via the substitution J — > | J. This tells us 
that the state p — (which always solves the saddle-point equation) de-stabilizes at 



Dz 



1 — tanh 2 (/3<rz) 



(49) 



and that this corresponds to a true second-order transition. 

We show the resulting phase diagram (complemented by numerical analysis of the transitions) in figure 
^, for n = 2. As a result of the symmetry -P(/i) = P(—^) the 51 phase always has p = (in contrast to 
e.g. ^-distributed forces). The S2 and C2 phases involve p ^ (with the two values different in sign only). 
One notes again the crucial dependence on n of the existence of some of the phases (clearly evident in the 
equations above) . It is clear that upon choosing a sufficiently large value for a one can eliminate the biologically 
undesirable single species states at any temperature T. 
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7. Discussion 

Our objective was to define and study a simple solvable model describing the coupled dynamics of (fast) 
compactification and (slow) genetic monomer sequence selection in hetero-polymers, driven by the competing 
demands of functionality and reproducibility of the resulting folded structures, as a first step towards 
understanding the genesis and statistics of natural amino-acid sequences in proteins. Our model is a simple 
mean- field hetero-polymer whose state is described by two degrees of freedom per monomer i: an angle fa, 
giving its orientation relative to the backbone, and a binary variable r\i, giving its polarity (i.e. hydrophobic vs. 
hydrophilic). The evolution of the orientation variables represents folding; that of the polarity variables (which 
involves changing monomer species) represents genetic evolution. The latter is assumed to be adiabatically slow 
compared to the former. There is also explicit (quenched) site disorder in the model, in the form of a simple 
(site-factorized) random functionality potential for the monomer sequences. 

We have solved our model in equilibrium using the finite-n replica formalism, designed to analyze statistical 
mechanical systems with disparate time-scales, where n denotes the ratio of the noise levels (or temperatures) 
in the two stochastic processes. Replica symmetry is stable, and we are able to derive closed equations for 
the system's order parameters, which measure the degree of separation of the monomer species, and can be 
interpreted in terms of swollen versus compact states. This leads to explicit analytical results for ground states 
and high-temperature states. Solution of our order-parameter equations at intermediate temperatures requires 
a combination of analytical and numerical methods. Since the qualitative behaviour of the system can already 
be captured by restricting ourselves to q < 3 (q denotes the number of possible orientation angles per monomer) , 
we calculate phase diagrams for q G {2,3} and for different choices of the disorder in the sequence functionality 
potential. These exhibit a rich phenomenology of first- and second-order phase transitions, and shed light on 
the emergence of monomer species statistics (characterized by the fraction of hydrophobic versus hydrophilic 
monomers) from the underlying genetic dynamics. The properties of the functionality potential are found to 
have a large impact on the nature of the phase diagram; the distribution of genetic forces needs to be sufficiently 
broad relative to its average, in order to prevent the system from evolving towards a single-species state (where 
all monomers have the same polarity) . This is perfectly compatible with the efficiency-driven biological need for 
large sequence diversity (in order to use the available hetero-polymer hardware for as many different biological 
functions as possible), which again favors the p — states. 

There is much scope for further work. At a theoretical level one could repeat the above analysis for models 
in which the folding process also includes short-range forces (e.g. hydrogen bonds and steric forces), as in ]l7j), 
which would involve n-replicated transfer matrices, or study the dynamics (at either the fast or the slow time- 
scale, or both). One could also inspect more realistic sequence functionality potentials V(rj) with a large number 
of local minima. At the level of increasing biological realism one could introduce more sophisticated degrees of 
freedom for the monomers, such as three real-valued orientation angles per monomer, or allow for a realistic 
number of monomer species (as opposed to two) and endow each with additional physical characteristics. 

A final criticism to be raised against the present model is that its genetic Hamiltonian favours sequences 
which allow for efficient shielding of hydrophobic monomers and which meet functionality requirements. Efficient 
shielding, and hence reliable compactification, is our simplified measure of structure reproducibility. It would 
be interesting to construct and solve a model in which the genetic Hamiltonian really measures the number of 
meta-stable states, and rewards sequences which are not only guaranteed to generate a compact conformation 
(as in the present model) but also a unique one. 
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Appendix A. Stability analysis 

General properties 

Here we analyse the local stability properties of the saddle-points of the free energy surface per monomer /[{z}], 
which is extremized in (Q), by studying the eigenvalue problem of the Hessian D ac j, nt f, — d 2 f[{z}]/dz^dz^. In 
replica-symmetric saddle-points, where z^ = = PJL^ for all a, the Hessian takes the following form 

-if 1 

-Dq0,7V> — — L^f, + 5 ai [L^ — M^] (A. 2) 

where 



K H = {[w + v^-w-v^][w+v^-w-v^])^ (A.3) 

= (w+v+v+ + w-v^)^ (A.4) 

= S^iw+v^ + w^v^) u (A.5) 
and with the short-hands 



w+ = — = ! = v^r — — ^ — 

e~< 3 '- 1 (X^e< 9JL *)™ + e^(^e^ /i *) n * X^' e *' 

The three matrices {K, L, M} are all positive definite and symmetric, and the non-negative quantities w± and 
t>0 obey the normalization relations w+ + W- = 1 and — 1. For high temperatures (where we know the 

system is ergodic) we just find 

P 2 D^ ni , = I ji^^ + o^J (/3->0) 

(i.e. D is positive definite for j3 — * 0). Hence the condition for a local de-stabilization of an RS saddle-point 
{L^} is given in terms of the smallest eigenvalue of D as 

Da^xl = \x%, A min = -1/20 J (A.6) 



We now set out to determine the eigenvectors and eigenvalues of (A. 2) 
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RS & RSB eigenvectors, local stability of the RS saddle-point 



Replica-symmetric eigenvectors, describing fluctuations within the RS subspace, are of the form = x^ for all 
a. Insertion into the eigenvalue problem of (A.2) gives 



M 



X = A RS X 



(A.7) 



with x = {x<p}. Since the matrices (A. 3, A. 4 A. 5) are qx q ones, we know that the RS eigenspace is (^-dimensional. 
The matrix D is self-adjoint, so we know that the RSB eigenvectors, which we write as y^, must be orthogonal 
to the above g-dimensional RS eigenspace; they must obey the q conditions ^ Q 2/^ = for all <f>. Insertion into 
the eigenvalue problem of (A.2) now gives 

Yfiw - = Arsb V% (A.8) 

Together with the othogo nalit y conditions this leaves an (n—l)q dimensional RSB eigenspace, as it should. We 
conclude that, in view of ( A. 6 ), that second-order transitions of the RS and RSB type occur when, respectively 

x- [n(K-L) + L- M]x 



RS transitions : 



RSB transitions 



mm 



mm 

xeE», #0 



ar 



2(3J 



x-[L- M]x 



2(3J 



(A.9) 



(A.10) 



Below we show that the matrix K — L is negative definite. As a consequence we can be certain that a de- 
stabilization of an RS saddle-point will always occur within the RS eigenspace, since 



mm 

xem.i, 



x ■ [n(K-L) + L - M]x 



< mm 
xeM.1, t^O 



x \L - M]x 



(only non- physical RS saddle-points can be locally unstable against RSB fluctuations). Hence replica-symmetry 
will always be locally stable. What remains is to confirm that K— L is indeed negative definite. To do so we 
first define (x)± — XL %<!>■ This allows us to write 



[w + \(x)+\ 



< 
2 1 1 



W-(x)—] — w+(x) 2 + — w-(x) 2 _ 



■W-\(x)-\] 2 -w + \(x) + \ 2 - w-\(x)-f 



This completes our proof. 



